-
Notifications
You must be signed in to change notification settings - Fork 130
Adds information about cooldown periods for trained model autoscaling in Serverless #2498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🦖
Very nice!
deploy-manage/cloud-organization/billing/elasticsearch-billing-dimensions.md
Outdated
Show resolved
Hide resolved
@prwhelan Can you review for technical accuracy? Thx! |
* When using the inference API for {{es}} or ELSER, [enable `adaptive_allocations`](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations). | ||
|
||
::::{note} | ||
In {{serverless-short}}, trained model deployments scale down to zero only after 24 hours without any inference requests. After scaling up, they remain active for 5 minutes before they can scale down again. During these cooldown periods, you will continue to be billed for the active resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true outside of serverless as well. All environments will now wait 24 hours before scaling to zero: elastic/elasticsearch#128914
Outside of serverless, this can be modified using xpack.ml.trained_models.adaptive_allocations.scale_to_zero_time
to a minimum of one minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @prwhelan, thanks a lot for your feedback! I've modified my PR based on it, along with a few other smaller changes:
-
Trained model autoscaling: I moved the cooldown period information into its own heading. This makes it easier to highlight and also allows other pages to link directly to this specific section.
-
Autoscaling: I felt that going into the details of cooldown periods here would be out of scope and make the page a bit overwhelming. Instead, I added a more concise sentence that links to the new Cooldown periods section on the Trained model autoscaling page.
-
Elasticsearch billing dimensions: Realizing that this page is only applicable to Serverless, I updated the description for the Machine learning trained model autoscaling bullet point to reflect the new autoscaling behavior in Serverless.
Please let me know if you think these changes are appropriate or if you’d like me to adjust anything.
Thanks again!
This PR adds information about cooldown periods for trained model autoscaling in serverless projects.
Changes
Related issue: https://github.com/elastic/docs-content-internal/issues/177